46 research outputs found
Action Recognition in Video Using Sparse Coding and Relative Features
This work presents an approach to category-based action recognition in video
using sparse coding techniques. The proposed approach includes two main
contributions: i) A new method to handle intra-class variations by decomposing
each video into a reduced set of representative atomic action acts or
key-sequences, and ii) A new video descriptor, ITRA: Inter-Temporal Relational
Act Descriptor, that exploits the power of comparative reasoning to capture
relative similarity relations among key-sequences. In terms of the method to
obtain key-sequences, we introduce a loss function that, for each video, leads
to the identification of a sparse set of representative key-frames capturing
both, relevant particularities arising in the input video, as well as relevant
generalities arising in the complete class collection. In terms of the method
to obtain the ITRA descriptor, we introduce a novel scheme to quantify relative
intra and inter-class similarities among local temporal patterns arising in the
videos. The resulting ITRA descriptor demonstrates to be highly effective to
discriminate among action categories. As a result, the proposed approach
reaches remarkable action recognition performance on several popular benchmark
datasets, outperforming alternative state-of-the-art techniques by a large
margin.Comment: Accepted to CVPR 201
Comparing Neural and Attractiveness-based Visual Features for Artwork Recommendation
Advances in image processing and computer vision in the latest years have
brought about the use of visual features in artwork recommendation. Recent
works have shown that visual features obtained from pre-trained deep neural
networks (DNNs) perform very well for recommending digital art. Other recent
works have shown that explicit visual features (EVF) based on attractiveness
can perform well in preference prediction tasks, but no previous work has
compared DNN features versus specific attractiveness-based visual features
(e.g. brightness, texture) in terms of recommendation performance. In this
work, we study and compare the performance of DNN and EVF features for the
purpose of physical artwork recommendation using transactional data from
UGallery, an online store of physical paintings. In addition, we perform an
exploratory analysis to understand if DNN embedded features have some relation
with certain EVF. Our results show that DNN features outperform EVF, that
certain EVF features are more suited for physical artwork recommendation and,
finally, we show evidence that certain neurons in the DNN might be partially
encoding visual features such as brightness, providing an opportunity for
explaining recommendations based on visual neural models.Comment: DLRS 2017 workshop, co-located at RecSys 201
An Efficient Point-Matching Method Based on Multiple Geometrical Hypotheses
Point matching in multiple images is an open problem in computer vision because of the numerous geometric transformations and photometric conditions that a pixel or point might exhibit in the set of images. Over the last two decades, different techniques have been proposed to address this problem. The most relevant are those that explore the analysis of invariant features. Nonetheless, their main limitation is that invariant analysis all alone cannot reduce false alarms. This paper introduces an efficient point-matching method for two and three views, based on the combined use of two techniques: (1) the correspondence analysis extracted from the similarity of invariant features and (2) the integration of multiple partial solutions obtained from 2D and 3D geometry. The main strength and novelty of this method is the determination of the point-to-point geometric correspondence through the intersection of multiple geometrical hypotheses weighted by the maximum likelihood estimation sample consensus (MLESAC) algorithm. The proposal not only extends the methods based on invariant descriptors but also generalizes the correspondence problem to a perspective projection model in multiple views. The developed method has been evaluated on three types of image sequences: outdoor, indoor, and industrial. Our developed strategy discards most of the wrong matches and achieves remarkable F-scores of 97%, 87%, and 97% for the outdoor, indoor, and industrial sequences, respectively
Our Deep CNN Face Matchers Have Developed Achromatopsia
Modern deep CNN face matchers are trained on datasets containing color
images. We show that such matchers achieve essentially the same accuracy on the
grayscale or the color version of a set of test images. We then consider
possible causes for deep CNN face matchers ``not seeing color''. Popular
web-scraped face datasets actually have 30 to 60\% of their identities with one
or more grayscale images. We analyze whether this grayscale element in the
training set impacts the accuracy achieved, and conclude that it does not.
Further, we show that even with a 100\% grayscale training set, comparable
accuracy is achieved on color or grayscale test images. Then we show that the
skin region of an individual's images in a web-scraped training set exhibit
significant variation in their mapping to color space. This suggests that
color, at least for web-scraped, in-the-wild face datasets, carries limited
identity-related information for training state-of-the-art matchers. Finally,
we verify that comparable accuracy is achieved from training using
single-channel grayscale images, implying that a larger dataset can be used
within the same memory limit, with a less computationally intensive early
layer
The impact of MEG source reconstruction method on source-space connectivity estimation: A comparison between minimum-norm solution and beamforming.
Despite numerous important contributions, the investigation of brain connectivity with magnetoencephalography (MEG) still faces multiple challenges. One critical aspect of source-level connectivity, largely overlooked in the literature, is the putative effect of the choice of the inverse method on the subsequent cortico-cortical coupling analysis. We set out to investigate the impact of three inverse methods on source coherence detection using simulated MEG data. To this end, thousands of randomly located pairs of sources were created. Several parameters were manipulated, including inter- and intra-source correlation strength, source size and spatial configuration. The simulated pairs of sources were then used to generate sensor-level MEG measurements at varying signal-to-noise ratios (SNR). Next, the source level power and coherence maps were calculated using three methods (a) L2-Minimum-Norm Estimate (MNE), (b) Linearly Constrained Minimum Variance (LCMV) beamforming, and (c) Dynamic Imaging of Coherent Sources (DICS) beamforming. The performances of the methods were evaluated using Receiver Operating Characteristic (ROC) curves. The results indicate that beamformers perform better than MNE for coherence reconstructions if the interacting cortical sources consist of point-like sources. On the other hand, MNE provides better connectivity estimation than beamformers, if the interacting sources are simulated as extended cortical patches, where each patch consists of dipoles with identical time series (high intra-patch coherence). However, the performance of the beamformers for interacting patches improves substantially if each patch of active cortex is simulated with only partly coherent time series (partial intra-patch coherence). These results demonstrate that the choice of the inverse method impacts the results of MEG source-space coherence analysis, and that the optimal choice of the inverse solution depends on the spatial and synchronization profile of the interacting cortical sources. The insights revealed here can guide method selection and help improve data interpretation regarding MEG connectivity estimation